Detection of speech embedded in real acoustic background based on amplitude modulation spectrogram features

نویسندگان

  • Jörn Anemüller
  • Denny Schmidt
  • Jörg-Hendrik Bach
چکیده

A classification method is presented that detects the presence of speech embedded in a real acoustic background of non-speech sounds. Features used for classification are modulation components extracted by computation of the amplitude modulation spectrogram. Feature selection techniques and support vector classification are employed to identify modulation components most salient for the classification task and therefore considered as highly characteristic for speech. Results show that reliable detection of speech can be performed with less than 10 optimally selected modulation features, the most important ones are located in the modulation frequency range below 10 Hz. Detection of speech in a background of non-speech signals is performed with about 90% test-data accuracy at a signal-to-noise level of 0 dB. Compared to standard ITU G729.B voice activity detection, the proposed method results in increased true positive and reduced false positive rates induced by a real acoustic background.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting novel objects in acoustic scenes through classifier incongruence

In this study, a new generic framework for the detection and interpretation of disagreement (“incongruence”) between different classifiers [1] is applied to the problem of detecting novel acoustic objects in an office environment. Using a general model that detects generic acoustic objects (standing out from a stationary background) and specific models tuned to particular sounds expected in the...

متن کامل

On the use of spectro-temporal features for the IEEE AASP challenge 'detection and classification of acoustic scenes and events'

In this contribution, an acoustic event detection system based on spectro-temporal features and a two-layer hidden Markov model as back-end is proposed within the framework of the IEEE AASP challenge ‘Detection and Classification of Acoustic Scenes and Events’ (D-CASE). Noise reduction based on the log-spectral amplitude estimator by [1] and noise power density estimation by [2] is used for sig...

متن کامل

Acoustic Features for Classification Based Speech Separation

Speech separation can be effectively formulated as a binary classification problem. A classification based system produces a binary mask using acoustic features in each time-frequency unit. So far, only pitch and amplitude modulation spectrogram have been used as unit level features. In this paper, we study other acoustic features and show that they can significantly improve both voiced and unv...

متن کامل

Robust speech recognition using the modulation spectrogram

The performance of present-day automatic speech recognition (ASR) systems is seriously compromised by levels of acoustic interference (such as additive noise and room reverberation) representative of real-world speaking conditions. Studies on the perception of speech by human listeners suggest that recognizer robustness might be improved by focusing on temporal structure in the speech signal th...

متن کامل

Investigating modulation spectrogram features for deep neural network-based automatic speech recognition

Deep neural network (DNN) based acoustic modelling has been shown to yield significant improvements over Gaussian Mixture Models (GMM) for a variety of automatic speech recognition (ASR) tasks. In addition, it is also becoming popular to use rich speech representations, such as full-resolution spectrograms and perceptually motivated features, as input to the DNNs as they are less sensitive to t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008